Skip to content

Conversation

@apetresc
Copy link
Contributor

@apetresc apetresc commented Sep 7, 2016

What changes were proposed in this pull request?

The Scala version of SparkContext has a handy field called uiWebUrl that tells you which URL the SparkUI spawned by that instance lives at. This is often very useful because the value for spark.ui.port in the config is only a suggestion; if that port number is taken by another Spark instance on the same machine, Spark will just keep incrementing the port until it finds a free one. So, on a machine with a lot of running PySpark instances, you often have to start trying all of them one-by-one until you find your application name.

Scala users have a way around this with uiWebUrl but Java and Python users do not. This pull request fixes this in the most straightforward way possible, simply propagating this field through the JavaSparkContext and into pyspark through the Java gateway.

Please let me know if any additional documentation/testing is needed.

How was this patch tested?

Existing tests were run to make sure there were no regressions, and a binary distribution was created and tested manually for the correct value of sc.uiWebPort in a variety of circumstances.

@srowen
Copy link
Member

srowen commented Sep 7, 2016

That's probably OK; Java users can already pretty easily get this anyway but not Python users.

@srowen
Copy link
Member

srowen commented Sep 7, 2016

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Sep 7, 2016

Test build #65057 has finished for PR 15000 at commit 9c57eb7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zjffdu
Copy link
Contributor

zjffdu commented Sep 8, 2016

LGTM

@srowen
Copy link
Member

srowen commented Sep 12, 2016

My only hesitation about this is that this property really only exists to print it in the shell. Is there a good use case for it otherwise? I know it's minor but want to make sure we're not just doing this for parity.

@apetresc
Copy link
Contributor Author

Well, here's the use case I want it for: I'm building some plugins for JupyterHub to make it more Spark-aware, and I want to be able to link the user out to the right WebUI for their kernel. Short of somehow making the launcher override the user's own SPARK_CONF_DIR to set the port manually to one that I'm already sure is open, there's no other way to do that. But I do have access to the SparkContext so with this property, I can create the link effortlessly.

@srowen
Copy link
Member

srowen commented Sep 12, 2016

Is this Java or Pyspark? In Java you can still get this property directly from the underlying SparkContext.

@apetresc
Copy link
Contributor Author

PySpark. I don't think anyone runs Java through Jupyter, haha.

@srowen
Copy link
Member

srowen commented Sep 12, 2016

Ah right dumb question. Yeah I think it makes some sense ... maybe not even for Java because there are lots of methods we don't plumb through because you can easily access them directly rom scala. Python, OK.

@apetresc
Copy link
Contributor Author

@srowen: Just to make sure I understand, are you asking me to remove the Java accessor here, and just plumb straight through to the Scala object from PySpark? Or is it fine as-is?

@srowen
Copy link
Member

srowen commented Sep 14, 2016

Looking at context.py, it seems that it accesses things directly from SparkContext via _jsc.sc() where possible. I think that means you can just expose this in Pyspark without exposing it separately in JavaSparkContext.

@zjffdu
Copy link
Contributor

zjffdu commented Sep 14, 2016

@srowen _jsc.sc() is JavaSparkContext, I think that's why @apetresc expose it in JavaSparkContext first

@srowen
Copy link
Member

srowen commented Sep 14, 2016

_jsc is a JavaSparkContext and .sc() is a SparkContext. That's accessible, but maybe you'll tell me that the Py4J wrapper won't play nice with Scala Option? not sure. See how other similar fields are handled I guess.

@zjffdu
Copy link
Contributor

zjffdu commented Sep 14, 2016

yeah, I just try the following statement and it works. But I think it is no harm to expose it in JavaSparkContext as well.

sc._jsc.sc().uiWebUrl().get()

@srowen
Copy link
Member

srowen commented Sep 14, 2016

I don't see value in exposing it, and many other things aren't exposed via JSC. It's really only things that need a different API to make sense in Java.

And leave only the PySpark one, which now plumbs straight to
the Scala SparkContext.
@apetresc
Copy link
Contributor Author

As requested, removed the Java property and left only the PySpark one.

I'll admit I didn't appreciate that you could access the Scala SparkContext straight from PySpark originally (I figured the property would have to be propagated through the Java wrapper first), so the urgency of this patch is much lessened for me now that I know I can just do sc._jsc.sc().uiWebUrl().get() manually anyway - but it would still be nice to have a simple property for this instead :)

@srowen
Copy link
Member

srowen commented Sep 15, 2016

It's still probably reasonable to plumb it through but I'll leave it open a bit for comments.

@srowen
Copy link
Member

srowen commented Sep 18, 2016

@davies do you have an opinion?

@SparkQA
Copy link

SparkQA commented Sep 20, 2016

Test build #3280 has finished for PR 15000 at commit cb13f43.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Sep 20, 2016

merged to master

@asfgit asfgit closed this in 4a426ff Sep 20, 2016
@apetresc apetresc deleted the pyspark-uiweburl branch September 20, 2016 15:53
@dmvieira
Copy link

Hey guys, How can I do same thing using sparkR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants